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Abstract 

Recent studies indicate that the disctiminatory powet of the cote DNA barcodes (rbcLa + matK) for land plants 
may have been overestimated since theif petfotmance have been tested only on few closely telated species. In 
this study we focused mainly on how the addition of complementary barcodes (nrlTS and trnH-psbA) to 
the core barcodes will affect the performance of the core barcodes in discriminating closely related species 
from family to section levels. In general, we found that the core barcodes performed poorly compared to 
the various combinations tested. Using multiple criteria, we finally advocated for the use of the core + 
trnH-psbA as potential DNA barcode for the family Combretaceae at least in southern Africa. Our results 
also indicate that the success of DNA barcoding in discriminating closely related species may be related to 
evolutionary and possibly the biogeographic histories of the taxonomic group tested. 

Keywords 

DNA barcoding, closely related species, Combretaceae, southern Africa 



Copyright Jephris Gere et al. This is an open access article distributed under the terms of the Creative Commons Attribution International License 
(CC BY 4.0). which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 



128 



Jephris Gere et al. I ZooKeys 365: 127-147 (2013) 



Introduction 

Combretaceae is a medium-sized family within Myrtales, comprising about 500 spe- 
cies in 17 to 23 genera. It has long been referred to as a complex phylogenetic and 
taxonomic group (Tan et al. 2002, Maurin et al. 2010, Stace 2010, Jordaan et al. 
201 1). Based on morphological characters and phylogenetic analysis, the family Com- 
bretaceae has been recovered as monophyletic and sister to the rest of Myrtales (Brown 
1810, Dahlgren and Thorne 1984, Tan et al. 2002, Sytsma et al. 2004, Maurin et al. 
2010, Stace 2010). Members of Combretaceae are mainly trees, shrubs or lianas, oc- 
cupying a wide range of habitats from savannas, forests, to woodlands (Maurin et al. 
20 1 0) and are distributed in tropical and subtropical regions across the globe. With ca. 
350 species, Combretum LoefL, the largest genus in the family has its centre of diversity 
in Africa, with approximately 63 species described in southern Africa - south of the 
Zambezi river and includes South Africa, Zimbabwe, Namibia, Botswana, Lesotho, 
Swaziland, and Mozambique (Maurin et al. 2010, Jordaan et al. 201 1). 

The major distinguishing feature of the family is the presence of unicellular com- 
bretaceous hairs on the abaxial leaf surfaces, a diagnostic trait in many other species 
of Myrtales and even beyond the group e.g. the Cistaceae Juss. family, tribe Cisteae 
(Maurin et al. 2010, Stace 2010). However, other morphological features such as 
presence of trichomes, stalked glands, domatia, inflorescence, fruit shape, leaf and 
pollen morphology are also important for species delimitation in Combretaceae (Exell 
and Stace 1966, Stace 2007, 2010, Maurin et al. 2010, Jordaan et al. 201 1). Nonethe- 
less, all these characters are not adequate enough to delimit species within the family 
because none is unique to a specific clade. As a result, the family has experienced 
several splitting and lumping in the past (El Ghazlai et al. 1998, Tan et al. 2002, 
Maurin et al. 2010, Stace 2010, Jordaan et al. 2011). Also, the taxonomy is further 
confounded by the high morphological similarity between members of different sec- 
tions. For instance, inflorescence and fruit shapes are very similar between species and 
across clades (Figures 1 and 2). Such homoplasious morphological similarities have 
also been identified as the root of difficulties in delimiting the genera; for example in 
the Combretum— Quisqualis clade (Jordaan et al. 201 1). Consequently, it becomes nec- 
essary to search for an alternative method to augment traditional morphology-based 
taxonomy of Combretaceae. 

Here, we propose that DNA barcoding may provide such a complementary tool 
to ease species delimitation within the group. DNA barcoding involves the use of a 
short and standardised DNA sequence that can help assign, even biological specimens 
devoid of diagnostic features, to species (Hebert et al. 2004, 2010, Hajibabaei et al. 
2006, Roy et al. 2010, Van der Bank et al. 2012, Franzini et al. 2013). Two DNA 
regions defined as 'core barcodes', i.e. rbcLz and matK have been standardised as DNA 
barcodes for land plants (CBOL Plant Working Group 2009). In addition to the core 
barcodes, two other regions, trnH-psbA and nrlTS were suggested as supplementary 
DNA barcodes for plants (Hollingsworth et al. 201 1, Li et al. 201 1). The rationale for 
adopting these two regions (rbcLz and matK) is high levels of recoverability of high- 
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Figure I . Selected inflorescences of seven Combretum species indicating closely related species evaluated 
based upon floral characters. A C. paniculatum B C. microphyllum C C. platypetalum D C. hereroense 
E C. apiculatum F C. molle G C. kraussii. All photographs by O. Maurin. 

quality sequences and acceptable levels of species discrimination (Burgess et al. 201 1). 
The discriminatory power of the core DNA barcodes for land plants was estimated at 
70-80% (CBOL Plant Working Group 2009, Fazekas et al. 2009, Kress and Erickson 
2007). However, a recent study suggests that efficacy of core barcodes may have been 
overestimated, arguing that taxon sampling has been biased towards less-related species 
(Clement and Donoghue 2012). Furthermore, barcoding efficacy is rarely evaluated in 
a phylogenetic context (but see Clement and Donoghue 2012), resulting in potentially 
biased estimates of discriminatory power. 

In this study, we evaluated the efficacy of DNA barcoding as a tool to augment 
morphological species discrimination within Combretaceae. Specifically, we (1) as- 
sessed the potential of four markers to discriminate southern African species of the 
family, and (2) assessed the efficacy of barcodes across major clades including subgen- 
era and sections within the largest genus Combretum. 
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Figure 2. Selected mature dry four-winged fruits of closely related species of genus Combretum. A C. 
mkuzense B C. microphyllum C C. englerii D C. apiculatum E C. moggii F C. albopunctatum G C. col- 
linum. All photographs by O. Maurin. 

Methods 

Sampling includes one to six accessions of 58 species out of the 63 species representing 
the six genera of Combretaceae in southern Africa. These genera include Combretum 
(43 species included in this study), Lumnitzeria Wild, (one species included), Meioste- 
mon Exell and Stace (one species included), and Quisqualis L. (one species included), 
Pteleopsis Engl, (two species included), and Terminalia (nine species included). 

Collection details, taxonomy, voucher numbers, GPS coordinates, field pictures, 
and sequence data (only matK and rbcLa) are archived online on the BOLD system 
(www.boldsystems.org). Voucher information, name of herbarium, GenBank and 
BOLD accession numbers are listed in Appendix 1 . 
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DNA extraction, amplification and alignment 

Genomic DNA was extracted from silica gel-dried and herbarium leaf material following a 
modified cetyltrimethyl ammonium bromide (CTAB) method of Doyle and Doyle (1987). 
To ease the effects of high polysaccharide concentrations in the DNA samples, we added 
polyvinyl pyrolidone (2% PVP). Purification of samples was done using QIAquick puri- 
fication columns (Qiagen, Inc, Hilden, Germany) following the manufacturers protocol. 

All PCR reactions were carried out using Ready Master Mix (Advanced Biotech- 
nologies, Epsom, Surrey, UK). We added 4.5% of dimethyl sulfoxide (DMSO) to 
the PCR reactions of nrlTS to improve PCR efficiency. Amplification of rbcLz was 
done using the primer combination: IF: 724R (Olmstead et al. 1992, Fay et al. 1998). 
For matK, the following primer combination was used 390F: 1326R (Cuenoud et al. 
2002). Intergenic spacers trnH-psbA and psaA-ycfB were amplified using the primers 
trnH: psbA (Sang et al. 1997) and PG1F: PG2R (Huang and Shi 2002), respectively. 
Intergenic spacer psaA-ycfB was included in this study for the purpose of reconstruct- 
ing phylogeny of Combretaceae. The nrlTS region was amplified into two overlapping 
fragments using the following two pairs of internal primer combinations: 10 IF: 2R 
and 3F: 102R (White et al. 1990, Sun et al. 1994). 

The following programme was used to amplify rbcLz and trnH-psbA: pre-melt at 
94 °C for 60 s, denaturation at 94 °C for 60 s, annealing at 48 °C for 60 s, extension 
at 72 °C for 60 s (for 28 cycles), followed by a final extension at 72 °C for 7 min; for 
matK, the protocol consisted of pre-melt at 94 °C for 3 min, denaturation at 94 °C 
for 60 s, annealing at 52 °C for 60 s, extension at 72 °C for 2 min (for 30 cycles), final 
extension at 72 °C for 7 min. For nrlTS and spacer psaA-ycJ3 the protocol consisted of 
pre-melt at 94 °C for 1 min, denaturation at 94 °C for 60 s, annealing at 48 °C for 60 
s, extension at 72°C for 3 min (for 26 cycles), final extension at 72 °C for 7 min. 

Purification of the amplified products was done using QIAquick columns (QIA- 
gen, Germany) following the manufacturer's manual. The purified products were then 
cycle-sequenced with the same primers used for amplification using BigDye™ v3.1 
Terminator Mix (Applied Biosystems, Inc, ABI, Warrington, Cheshire, UK). Clean- 
ing of cycle-sequenced products was done using EtOH-NaCl, followed by sequencing 
on an ABI 3130x1 genetic analyser. 

Sequences were assembled, trimmed and edited using Sequencher v4.6 (Gene 
Codes Corp, Ann Arbor, Michigan, USA). Alignment was done using Multiple Se- 
quence Comparison by Log-Expectation v3.8.31 (Edgar 2004) followed by subse- 
quent manual adjustments to refine alignments. 



Data analysis 

Performance of DNA markers in species delimitation was tested at three taxonomic 
levels (family, subgenus, and section). At family level, we evaluated four single markers: 
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rbcLa, matK, trnH-psbA, and nrlTS. We also tested the core barcodes, i.e. r^cLa + matK 
(CBOL Plant Working Group 2009) and the following combinations: core + nrlTS, 
core + trnH-psbA, and core + trnH-psbA +nrITS. Four criteria were used to assess their 
barcoding potential: presence of 'barcode gap' (Meyer and Paulay 2005), discriminatory 
power, species monophyly, and PCR success rate. 

Barcode gap was evaluated in two ways: (1) we compared genetic variation within 
species (intraspecific genetic distance) versus between species (interspecific genetic dis- 
tance). This comparison was based on the mean, median, and range of both distances; 
(2) in addition, we also used Meier et al.'s (2008) approach of evaluating the gap com- 
paring the smallest interspecific distance with the greatest intraspecific distance. The 
genetic distances were calculated using the Kimura 2-parameter (K2P) model. We also 
assessed the index of sequence divergence, K, for each region, measured as the mean 
number of substitutions between any two sequences. 

The discriminatory power of DNA regions was conducted using three distance- 
based methods including Near Neighbour, Best Close Match (Meier et al. 2006) 
and the BOLD identification criteria. A good barcode should exhibit the highest 
rate of correct species identification by assigning the highest proportion of DNA 
sequences to the corresponding species names. All the sequences were labelled ac- 
cording to species names prior to testing. For the Best Close Match test, we de- 
termined, for each dataset (family, subgenera and sections), the optimised genetic 
distance suitable as threshold for species delimitation. Optimised thresholds were 
determined using the function "localMinima" implemented in the R package Spider 
1.1-1 (Brown et al. 2012). 

We also used the PCR success rate to evaluate the DNA regions. This evaluation 
was conducted based on the percentage of successful amplification. 

The test for species monophyly was conducted on a Neighbour-Joining (NJ) tree. 
We considered that a species is monophyletic when all individuals of the same species 
cluster on the NJ phylogram that we reconstructed. As such, the best barcode should 
provide the highest proportion of monophyletic species. We then evaluated for each 
DNA region and concatenated regions, the proportion of monophyletic (i.e. correct 
identification) and non-monophyletic species (incorrect identification). All our analy- 
ses were conducted in the R package Spider 1.1-1 (Brown et al. 2012). 

Finally, we evaluated the barcoding potential in discriminating phylogenetically 
deliminated clades in the phylogeny of the genus that was reconstructed based on 
the combination of five DNA regions (rbcL, matK, trnH-psbA, psaA-ycJB and nrlTS). 
The phylogeny was reconstructed based on maximum parsimony (MP) implemented 
in PAUP* v4.0bl0 (Swofford 2002). Tree searches were conducted using heuristic 
searches with 1000 random sequence additions, retaining 10 trees per replicate, with 
tree-bisection-reconnection (TBR) branch swapping and MulTrees in effect (saving 
multiple equally parsimonious trees). Based on Maurin et al. (2010) we used Strepho- 
nema mannii Hook. f. and S. pseudocola A. Chev. as outgroups. Node support was 
assessed using bootstrap (BP) values: BP > 70% for strong support (Hillis and Bull 
1993, Wilcox etal. 2002). 
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At subgeneric and sectional levels, we only tested the performance of core barcodes 
and best gene combination identified using the three criteria mentioned above (barcode 
gap, discriminatory power and species monophyly). 



Results 

The overall characteristics of single and combined DNA regions are presented in Table 1 . 
In general, our results indicate that the ranges and mean intraspecific distances were both 
lower than those of interspecific distances. Among single regions, rbcLz showed the low- 
est interspecific distance (mean = 0.009) with nrlTS exhibiting the highest genetic varia- 
tion between species (mean = 0.1 10). For all marker combinations, the mean interspecif- 
ic distances varied between 0.0 11 and 0.014. Assessing the index of sequence divergence 
K for each region, we found that nrlTS showed the highest divergence (K = 21) whereas 
trnH-psbA exhibited the lowest divergence (K = 3). For the combined regions, K varied 
between 10 and 13, with an average of 10 substitutions between sequence-pairs (Table 1). 

The distribution ranges of inter- versus intraspecific distances for all regions, 
showed a clear overlap between both distances (Figures 3a,b and 4), indicating the 
existence of a barcode gap. Comparing the smaller inter- versus the largest intraspecific 
distances for each region, our results further support the existence of barcode gap in 
all regions, but the proportion of sequences with barcode gap varied significantly with 
the regions tested (Table 2). Notably, the combination of all four regions exhibited the 
highest proportion of sequences with barcode gap (84%) followed by nrlTS (73%), 
then core + nrlTS (64%), and core + trnH-psbA (57%), with the lowest proportion 
found in r^cLa (13%) (Table 2). 

Optimised genetic distances used as threshold for species delimitation in Best 
Close Match method are shown in Table 1. Apart from rbcLz (threshold = 0.04%), 
core + trnH-psbA (threshold = 0.5%) and core + nrlTS (threshold = 0.7%), the thresh- 
olds for the remaining single and gene combinations were greater than 1%. 



Table I . Statistics of all gene regions for the southern African Combretaceae included in the study. 



DNA regions 


No. of 


Seq 


K 


Range 


Mean inter 


Range 


Mean intra 


Thres- 


seq 


length 


inter 


(±SD) 


intra 


(±SD) 


hold (%) 


rbcLa 


152 


552 


4 


0-0.09 


0.009±0.012 


0-0.08 


0.002±0.009 


0.04 


matK 


133 


771 


6 


0-0.07 


0.014±0.011 


0-0.02 


0.002±0.004 


1.10 


trnH-psbA 


116 


1034 


3 


0-0.15 


0.047±0.035 


0-0.03 


0.003±0.007 


1.80 


nrlTS 


91 


821 


21 


0-0.21 


0.110±0.045 


0-0.05 


0.004±0.010 


1.70 


rbcLa+matK 


129 


1323 


10 


0-0.78 


0.012±0.009 


0-0.05 


0.002±0.006 


1.31 


rbcLa-+ matK+trnH-psbA 


87 


2358 


11 


0-0.04 


0.012±0.007 


0-0.02 


0.002±0.004 


0.5 


rbcLz+matK+mYTS 


74 


2144 


9 


0-0.04 


0.011±0.006 


0-0.02 


0.002±0.004 


0.70 


rbcLa+matK+nr\TS +trnH- 
psbA 


70 


3178 


13 


0-0.04 


0.014±0.007 


0-0.02 


0.002±0.004 


1.17 
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Table 2. Percentage barcode gap in all sequences for each region using the Meier et al. (2008) approach. 



DNA region 


Number of sequences without 
g a P 


Proportion of sequences with 
gap ( U A>) 


L T 


132 


13 


matK. 


OO 


J J 


trnH-psbA 


54 


53 


nrlTS 


25 


73 


rbcLa+matK 


82 


36 


rbcLz + matK+ trnH-psbA 


37 


57 


rbchsL+matK+mlTS 


27 


64 


rbcLa+matK+mYTS+trnH-psbA 


11 


84 



Table 3. Identification efficacy of DNA barcodes using distance based methods. F = False and T = True. 



DNA region 


Near 
Neighbour 


BOLD (1%) 


Best Close Match 


F 


T 


Ambi- 
guous 


Correct 


Incorrect 


No 
ID 


Ambi- 
guous 


Correct 


Incorrect 


No 
ID 




(%) 


(%) 


(%) 


(%) 


(%) 


(%) 


(%) 


(%) 


(%) 


(%) 


rbcLa 


59 


41 


61 


18 


14 


7 


61 


18 


14 


7 


matK. 


46 


54 


81 


11 


7 


1 


47 


38 


14 


1 


trnH-psbA 


72 


28 


65 


22 


10 


3 


18 


60 


18 


4 


nrlTS 


35 


65 


29 


47 


10 


14 


10 


63 


19 


8 


rbcha+matK 


39 


61 


86 


10 


2 


2 


35 


51 


12 


2 


rbcha + matK+trnH-psbA 


38 


62 


79 


16 


2 


3 


6 


80 


8 


6 


rbcha+matK+mlTS 


43 


57 


62 


30 


7 


1 


3 


70 


19 


8 


rbcha+matK+mlTS + 
trnH-psbA 


36 


64 


52 


41 


3 


4 


0 


87 


9 


4 



Our results for the discriminatory power analysis varied with the methods applied 
(Table 3) at family level. Based on the Near Neighbour method, nrlTS provided the 
highest discriminatory power (65%) followed by rbcLz + mat¥L + trnH-psbA + nrlTS 
(64%), rbcLa + matK + trnH-psbA (62%), and rbcLz + matK (61%). The lowest dis- 
criminatory power was found for trnH-psbA (28%). 

BOLD species delimitation criteria of 1% threshold provided the lowest rate of 
correct identification among all three methods used. However, we found that nrlTS 
remains the most efficient region with 47% discriminatory power. The second most 
successful combination of regions were core + trnH-psbA + nrlTS (4 1 %) followed by 
core + nrlTS (30%) and trnH-psbA (22%); the core barcodes were identified as the 
least performing regions (10%) with the highest proportion of ambiguity (86%). 

In contrast to the two previous methods, the Best Close Match provided the high- 
est rate of species discrimination for the combined dataset (core + trnH-psbA + nrlTS) 
yielding the best discriminatory power (87%) with no ambiguity. This was followed 
by core+ trnH-psbA (80%), core + nrlTS (70%) and nrlTS (63%), with the poorest 
performance for r£cLa (18%) at family level. 
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trnH-psbA 




rbcLnmalK rbcLa+malK+tmH-psbA 




rbcLa*matK*nrtJ5 rbcLa+ma1K+nr\TS+tmH-psbA 




i' Mi rtiM ■ imta 

Figure 3. Comparisons of the distribution range of inter- versus intraspecific distances using boxplot 
a indicates comparison of single barcode gene regions b indicates the results of gene combinations. 



The last criterion used to evaluate the potential of DNA region was PCR efficiency. 
We found that rbcLa (87%) followed by trnH-psbA (85%) and matK (68%) were easy 
to amplify, with nrlTS being the most difficult (47%; Figure 5). 

We complemented previous analyses using species monophyly criteria after verify- 
ing the monophyly of Combretaceae. Among all regions, core + trnH-psbA isolated 
the highest proportion of monophyletic species (83%), followed by trnH-psbA (78%), 
nrlTS (76%), and combination of all four regions (65%). Again, rbcLz provided the 
lowest performance in identifying species as monophyletic (37%; Figure 6). 

In summary, all regions provided evidence for barcode gaps (Figure 3a, b and 4), 
but the strength of evidence varied with approaches used. Furthermore, the Best Close 
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ITS mmlK rbcL ImH-pMbA 




Figure 4. Relationships between inter- and intraspecific distances indicating barcoding gap for all 
regions tested. 




rbcLa matK trnH-psbA nrlTS 

Figure 5. PCR efficiency for the four candidate barcodes (rbcLa, matK, trnH-psbA, nrlTS). 

Match method provided the highest identification accuracy among the three distance- 
based methods used irrespective of genes or combinations tested. Under this method, 
the two best potential barcodes for southern African Combretaceae were first, core + 
trnH-psbA and second, core + trnH-psbA + nrlTS. However, based on species mono- 
phyly criteria, the single region trnH-psbA and the combination core + trnH-psbA 
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Figure 6. Gene performance based on monophyly criteria. False = proportion of non-monophyletic species; 
True = proportion of monophyletic species. 

showed high barcode potential, with trnH-psbA being the second best easy-to-amplify 
region after rbcLz.. 

We further evaluated the potential of each region as candidate barcode using a 
phylogeny of southern African Combretaceae (Appendix 2). Our results are congruent 
to the corresponding subset in the most recent and largest phylogeny assembled for 
the family (Appendix 3). Our evaluation for the discriminatory power at subgeneric 
level using the thresholds determined for the family (1.31% for the core and 0.5% for 
the core + trnH-psbA) revealed that the core barcodes alone were able to correctly iden- 
tify 78% of species within the subgenus Cacoucia. However, the core barcodes could 
discriminate only 50% of species within the subgenus Combretum. In particular, the 
discriminatory power of the core barcodes within both subgenera increased markedly 
to 100% when we added the trnH-psbA region (Table 4). This trend was consistent 
even when we applied the thresholds that have been optimised for the subgenera. 

At sectional level, we observed similar trends - the addition of trnH-psbA increased 
the performances of the core barcodes drastically except for Macrostigmatea (Table 5): 
Angustimarginata (core: 11%; core + trnH-psbA: 86%); Ciliatipetala (core: 55%; core 
+ trnH-psbA: 73%); Conniventia (core: 38%; core + trnH-psbA: 88%); Hypocrateropsis 
(core: 63%; core + trnH-psbA: 80%). However, Macrostigmatea (core 34%, core + 
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Table 4. Comparisons of efficacy of core barcodes and best barcode within subgenera Combretum and 
Cacoucia. 







No. of 

seq 


Mean Inter 
(±SD) 


Threshold 

(%) 


Best Close 


Match 




Subgenus 


DNA region 


Ambiguous 


Correct 


Incorrect 


No ID 






(%) 


(%) 


(%) 


(%) 


Cacoucia 


rbcLz+matY 


23 


0.004+0.002 


1.31 


13 


78 


9 


0 


rbcLi + matY+ trnH-psbA 


16 


0.006+0.002 


0.5 


0 


100 


0 


0 


Combretum 


rbcLz+matK 


84 


0.009+0.009 


1.31 


36 


50 


12 


2 


rbcLi + matY+trnH-psbA 


16 


0.006+0.002 


0.5 


0 


100 


0 


0 



Table 5. Comparisons of core barcodes and the best barcode within five sections of the subgenera Com- 
bretum and Cacoucia. 













Best Close Match 


Sections 


DNA regions 


No. of seq 


Mean inter 
(±SD) 


Threshold (%) 


Ambiguous (%) 


Correct (%) 


Incorrect (%) 


No ID (%) 


Angustimarginata 


rbcLz+matY 


19 


0.007±0.014 


2.6 


58 


11 


26 


5 


rbcLa + matYL+trnH-psbA 


15 


0.006±0.006 


0.7 


0 


86 


7 


7 


Ciliatipetala 


rbcLz+matYL. 


20 


0.004±0.002 


0.3 


45 


55 


0 


0 


rbcLa + matYL+trnH-psbA 


15 


0.006±0.003 


0.5 


0 


73 


27 


0 


Conniventia 


rbcLz+matYL. 


8 


0.005±0.004 


0.8 


37 


38 


12 


13 


rbcLa + matYL+trnH-psbA 


8 


0.010±0.006 


2.4 


0 


88 


12 


0 


Hypocrateropsis 


rbcLz+matK 


8 


0.012±0.005 


1.31 


25 


63 


12 


0 


rbcLsi + matYL+trnH-psbA 


5 


0.020±0.004 


0.8 


0 


80 


20 


0 


Macrostigmatea 


rbcLz+matY 


15 


0.002±0.001 


0.1 


53 


34 


13 


0 


rbcLsi + matYL+trnH-psbA 


9 


0.003±0.002 


0.2 


0 


44 


56 


0 



(Only sections with at least three different species are included). 



trnH-psbA 44%) showed the least performance, even with the addition of trnH-psbA to 
the core barcode, with just 10% increment being observed. This trend is not sensitive 
to the thresholds applied for the family or the sections. 

Finally, we compared the mean number of substitutions between any two species 
within each section. We found that the mean number of substitutions between repre- 
sentatives of Macrostigmatea is lowest (mean = 4) whereas it ranges between 5 and 19 
substitutions in other sections of subgenus Combretum. 



Discussion 

We evaluated genetic variation for both single and various combinations of rbcLa, 
matK, trnH-psbA and nrlTS. Comparing ranges of intra- versus interspecific distances, 
our results indicate that all markers show a barcode gap (Meyer and Paulay 2005); and 
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this is also true for the stringent Meier et al.'s (2008) approach, although the propor- 
tion of sequences with gap varies greatly with the marker used. 

The discriminatory power of the DNA regions in species identification also varies 
with the distance-based methods applied. From the methods tested, Near Neighbour 
and Best Close Match yielded high performance, with the latter giving the best results 
for the possible three and four different gene combinations. The core barcodes were 
not recognised among the three best options, and its discriminatory power has been 
questioned in a number of studies (Hollingsworth et al. 2009, Pettengill and Neel 
2010, Roy et al. 2010, Wang et al. 2010, Clement and Donoghue 2012). Based on all 
three distance methods, nrlTS emerges as the most suitable single region (as indicated 
under both Near Neighbour and BOLD; see also Kress et al. 2005, Kress and Erickson 
2007, Chen et al. 2010, Gao et al. 2010, Ren et al. 2010, China Plant BOL Group 
etal. 2011, Muellner et al. 2011, Pang et al. 2011, Wang et al. 2011, Liu et al. 2012, 
Yang et al. 2012). Among combined regions, core + nrlTS + trnH-psbA (under Best 
Close Match) emerges as most suitable for barcoding Combretaceae. 

However, our study indicates some important drawbacks that discount the inclu- 
sion of nrlTS as a good barcode. For example, based on amplification success criteria, 
nrlTS was the most difficult of all regions tested with rbcLz and trnH-psbA being the 
easiest regions to amplify. The technical hurdles in PCR amplification and sequencing 
of nrlTS may be linked to the presence of retro-transposons and other repetitive ele- 
ments within plant nuclear genomes, resulting in paralogous gene copies (Gao et al. 

2010, Hollingsworth 2011, Hollingsworth et al. 2011, Li et al. 2011). This is likely 
the case for nrlTS in Combretaceae as we found evidence of multiple copies that may 
not be identical to each other (see CBOL Plant Working Group 2009, Hollingsworth 

2011, Hollingsworth et al. 2011, Yang and Berry 2011). As such, the addition of 
trnH-psbA to the core barcodes (rbcLa. + matK + trnH-psbA) emerge as the best gene 
combination useful for species discovery and delimitation in Combretaceae (see also 
Newmaster and Ragupathy 2009, Petit and Excoffier 2009, Ragupathy et al. 2009, 
Wang et al. 2009, Area et al. 2012). 

Previous studies have shown that core barcodes are very limited in discriminating 
taxa that are phylogenetically closely related, and suggested that the efficacy of DNA 
barcodes should be tested within a phylogenetic context (Clement and Donoghue 
2012). We tested this using subgenera and sections of the family Combretaceae. Our 
evaluation of the discriminatory power of the core barcodes at subgeneric level re- 
vealed a striking difference in the performance between the two Combretum subgenera, 
Combretum and Cacoucia. The difference noted for the discriminatory power of the 
core barcodes between the two subgenera may reflect differences in their evolutionary 
history. Indeed, the latest dated phylogeny of Combretaceae indicated that members 
of the subgenus Cacoucia are represented with longer terminal branches than those in 
subgenus Combretum (Maurin 2009). 

While we found poor performance at sectional level, for example, in Angustimargi- 
nata, Macrostigmatea and Conniventia, this result is not unexpected due to a very low 
genetic variation one could expect within clades (see Ennos et al. 2005, Clement and 
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Donoghue 2012). However, the addition of trnH-psbA to the core barcodes results in 
a drastic increase of identification rate at both subgenus and sectional levels, validating 
the utility of trnH-psbA to discriminate even closely related species, except for section 
Macrostigmatea (Newmaster and Ragupathy 2009, Petit and Excoffier 2009, Ragupa- 
thy et al. 2009, Wang et al. 2009, but see Area et al. 2012, Clement and Donoghue 
2012, Zhang et al. 2012). 

The result for section Macrostigmatea reflects earlier tangle cited in previous studies 
regarding its composition (Stace 1980, Maurin et al. 2010, Jordaan et al. 201 1). In our 
analysis, we included Spathulipetala within section Macrostigmatea based on sugges- 
tions from recent molecular evidence (Maurin et al. 2010). Morphological studies sepa- 
rate these two sections, Spathulipetala and Macrostigmatea (Stace 1980, Jordaan et al. 
2011). Section Spathulipetala comprises two members, Combretum zeyheri Sond. and 
C. mkuzense J.D.Carr and Retief, which occur in the same geographical location and 
show close morphological similarity in their fruits (jordaan et al. 201 1). The inclusion 
of C. mkuzense, in this section has been controversial, with some authors (Exell 1978, 
Stace 1980) advocating for a tentative placement pending further investigation. How- 
ever, recent molecular study shows close relationship between these two species (Com- 
bretum zeyheri and C. mkuzense) (Maurin et al. 2010), which gives support to earlier 
morphological treatment. On the other hand, the taxonomy of section Macrostigmatea 
appears to pose fewer challenges as compared to Spathulipetala. A recent molecular 
study (Maurin et al. 2010) suggests lumping of these two sections, Spathulipetala and 
Macrostigmatea as members appear embedded in one clade with a high bootstrap sup- 
port of 100%. Earlier, Exell (1978) had reported that the sections are closely related, 
as they share similarities in scale size, scale fragmentation into fruit walls and fruit size. 

Based on our results, the unclear taxonomy reported for section Macrostigmatea, 
is reflected, indicating a need for further molecular analyses involving more taxa and 
gene sequences to correctly determine members of this section. Our results also sup- 
port the proposal of Exell (1978) to lump these two sections. The low performance of 
the core + trnH-psbA in fully discriminating the different species within this section is 
a strong indicator of the close phylogenetic similarity of the species. Our results indi- 
cate not only the utility of DNA barcoding data for discriminating species, but also to 
detect species that require further molecular analyses. 



Conclusions 

Our analysis indicates that the poor performance of the core barcodes at family level 
could not be generalised to lower levels: the core barcodes perform poorly in some sec- 
tions but shows strong discriminatory power in others. Such findings may indicate that 
the success of DNA barcodes in discriminating closely related species at least in plants 
may correlate with the evolutionary distinctiveness of the group tested and, as recently 
indicated, (see Clement and Donoghue 2012) it may also possibly reflects different bio- 
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geographic history between clades of the taxonomic group Combretaceae. Overall, we 
propose the core + trnH-psbA as the best barcode for the family Combretaceae. 
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Appendix I 

Supplementary Table SI. (doi: 10.3897/zookeys.365.5728.appl) File format: Microsoft 
Excel file (xls). 

Explanation note: Full names, voucher information, GenBank and BOLD accession 
numbers for taxa used in this study. A dash ( — ) indicates DNA regions not sampled 
and DNA sequences obtained from GenBank are underlined. Voucher specimens are 
deposited in the following herbaria: JRAU, University of Johannesburg (UJ), Johan- 
nesburg, South Africa; MO, Missouri Botanical Garden, St Louis, USA. 

Copyright notice: This dataset is made available under the Open Database License 
(http://opendatacommons.org/licenses/odbl/LO/). The Open Database License 
(ODbL) is a license agreement intended to allow users to freely share, modify, and use 
this Dataset while maintaining this same freedom for others, provided that the original 
source and author(s) are credited. 



Citation: Gere J, Yessoufou K, Daru BH, Mankga LT, Maurin O, van der Bank M (20 1 3) Incorporating trnH-psbA to core 
DNA barcodes improves significantly species discrimination within southern African Combretaceae. In: Nagy ZT, Backeljau 
X De Meyer M, Jordaens K (Eds) DNA barcoding: a practical tool for fundamental and applied biodiversity research. 
ZooKeys 365: 127-147. doi: 10.3897/zookeys.365.5728 Supplementary Table S 1 . doi: 10.3897/zookeys.365.5728.appl 



Appendix 2 

Supplementary Figure S 1 . (doi: 10.3897/zookeys.365.5728.app2) File format: Microsoft 
Word file (docx). 

Explanation note: One of most parsimonious trees obtained from the combined plas- 
tid and nuclear data (ybcLz., matK, trnH-psbA, and nrlTS) set. Clades highlighted 
indicate the sections that were identified from the MP tree obtained from barcoding 
gene regions. Bootstrap percentages above 50% are shown above the branches. 

Copyright notice: This dataset is made available under the Open Database License 
(http://opendatacommons.org/licenses/odbl/LO/). The Open Database License 
(ODbL) is a license agreement intended to allow users to freely share, modify, and use 
this Dataset while maintaining this same freedom for others, provided that the original 
source and author(s) are credited. 



Citation: Gere J, Yessoufou K, Daru BH, Mankga LT, Maurin O, van der Bank M (20 1 3) Incorporating trnH-psbA to core 
DNA barcodes improves significantly species discrimination within southern African Combretaceae. In: Nagy ZT, Backeljau 
T De Meyer M, Jordaens K (Eds) DNA barcoding: a practical tool for fundamental and applied biodiversity research. 
ZooKeys 365: 127-147. doi: 10.3897/zookeys.365.5728 Supplementary Figure SI. doi: 10.3897/zookeys.365.5728.app2 
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Appendix 3 

Supplementary Figure S2. (doi: 10.3897/zookeys.365.5728.app3) File format: Microsoft 
Word file (docx). 

Explanation note: One of most parsimonious trees with branch tips collapsed from 
the combined plastid and nuclear data {rbcL, matK, psaA-ycf), trnH-psbA, and nrlTS) 
set. Clades highlighted indicate sections that were identified from the MP tree ob- 
tained from barcoding gene regions. Above the branches are Bayesian posterior prob- 
ability (PP) values (> 0.5) and below are bootstrap percentages above 50%. 

Copyright notice: This dataset is made available under the Open Database License 
(http://opendatacommons.org/licenses/odbl/LO/). The Open Database License 
(ODbL) is a license agreement intended to allow users to freely share, modify, and use 
this Dataset while maintaining this same freedom for others, provided that the original 
source and author(s) are credited. 
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